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Claims 

1. A method for gene mapping from chromosome and phenotype data, which 
utilizes linkage disequilibrium between genetic markers aw;, which are polymorphic 
nucleic acid or protein sequences or strings of single-nucleotide polymorphisms de- 

5 riving from a chromosomal region, wherein 

i) all marker patterns P that satisfy a pattern evaluation function e(P) are 

searched from the data, wherein 

a. the marker patterns are expressions involving the genetic markers and their 
alleles and zero or more of the following: individual covariates, environ- 

10 mental variables and auxiliary phenotypes; and 

b. the pattern evaluation function e(P) involves some statistical measure of 
the association between the marker pattern P and the phenotype being 
studied, 

ii) each marker m; of the data is scored by a marker score s(mj), which is a func- 
15 tion of the set S t defined as the set of marker patterns overlapping the marker 

mi and satisfying the pattern evaluation function e as defined in step (i), and 

iii) the location of the gene is predicted as a function of the scores s(mj) of all the 

markers m; in the data and is based on maximizing the score if the scoring 
function is designed to give higher scores closer to the gene, and on minimiz- 
20 ing the score if the scoring function is designed to give lower scores closer to 

the gene, as is the case for instance when the scores s(mj) are marker-wise p 
values. 

2. A method of claim 1, wherein the chromosome data consists of either haplo- 
types or genotypes. 

25 3. A method of claim 1, wherein the haplotypes and genotypes referred to in the 
marker patterns contain flexible regions such as gaps or disjunctions. 

4. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

Input 



30 • set U of marker patterns 
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• evaluation function e{P) for patterns P in U 

• (generalization) relation < for patterns in U 

• where the function e and the relation < are such that if e(P) is true and P f < P ? 
then e{P r ) is also true 

5 Output 

• set S = {P e U | e(P) is true} of patterns 
Method 

1. S: = {} 

2. // Initialize the set of evaluated patterns: 
10 3. E:= {} 

4. // Start with the most general patterns: 

5. Gen := {P in U \ there is no P' in U, P' != P, such that P' < P} 

6. // Recursively evaluate patterns in a depth first order: 

7. foreach P e Gen { evaluatePatterns(P) } 
15 8. end; 

9. procedure evaluatePatterns(P) { 

1 0. insert P into the set E 

11. ife(P) = true then { 
20 12. insert P into set S 

13. // Find all specializations of P that have not been tested yet, and 

14. // evaluate them recursively: 

15. Spec := {P r in U-E\P< P\ P f ! = P, anrf /terc is no P" in C/-^, P" != P 

16. andP"\=P', with P < P" < P'}; 
25 17. foreach P f in 5pec { evaluatePatternsCPO; } 

18. } 

19. } 

5. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

30 Input 

• set U of marker patterns 

• evaluation function e{P) for patterns P in U 

• frequency threshold x 
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Output 

• set S = {P in U \ e(P) and ae(P) is true} of patterns, where ae(P) is true if and 
only if the frequency of pattern P exceeds a given threshold x 

Method 

5 20.5:- {} 

21. // Initialize the set of evaluated patterns: 

22. £:= {} 

23. // Start with the most general patterns: 

24. Gen := {P in U | tfzere w no P' in £/, P' ! = P 9 such that P->P'} 
10 25.11 Recursively evaluate patterns in a depth first order: 

26. foreach P in Gen { evaluatePatterns(P) } 

27. end 

28. procedure evaluatePatterns(P) { 
15 29. insert P into the set E 

30. if ae(P) = true then { 

31. if e{P) = true then insert P into set S 

32. // Find all specializations of P that have not been tested yet, and evaluate 

33 . // them recursively: 

20 34. Spec : = {P f in U-E \ P' -> P, P f ! = P, and there is no P" in U-E, P"l=P 

35. and P" ! = P\ with P* -> P" and P" -> P } 

36. foreach P f in Spec { evaluatePatternsCP') } 

37. } 

38. } 

25 6. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

Input 

• marker map M = (mj, ... ,ra£) 

• phenotype vector Y = (Yj, Y n ) 
30 • haplotype matrix H of size n * k 

• association threshold x for chi-squared test 

• maximum pattern length / 

• maximum number of gaps g 

• maximum gap size s 
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Output 

• set S = {P in U | e(P) is true} of patterns, 

• where U consists of patterns on Mthat consist of marker-allele assignments and 
that adhere to parameters /, g, and i, and 

5 • where e(P) is true if and only if chi-squared test on P using haplotype matrix H 
and phenotypes Y exceeds the given threshold x 

Method 

39.5: = {} 

40.// Number of case and control chromosomes: 
10 41./?/^4 := number of disease-associated chromosomes; 

42. pic : = number of control chromosomes; 

43. pi— pi A +piC 

44. // A lower bound for pattern frequency: 

45. lb : = piA *pi*x/ (pic * pi + PU * x ) 

1 5 46. // Variable for iterating over different patterns: 

47. P = (p 7 , ...,/>*) :=('*',..., '*0 

48. for /:= 7 to ^ { 

49. // alleles(m/) is the set of alleles of the /:th marker 

50. foreach a in alleles(m/) { 
20 51./?;:= a 

52. // Test pattern P and all its extensions: 

53. checkPatterns(P, i, i, 0, 0) 

54. //Reset pf 

55. ^;='*' 
25 56.} 

57. } 

58. end 

59. // Test haplotype pattern P and all patterns that can be generated by extending P 
30 60. //from the right: 

61. procedure checkPatterns(P, start, i, nr_of_gaps, gapjength) { 

62. // Output strongly associated patterns 

63. if chi-squaredOP, M, H,Y)>=x and pi != '*' then insert P into set S 

64. // Return if extended patterns would be too long: 
35 65. if i = k or i+l-start > I then return 

66.// Return if extended patterns can not be strongly disease-associated: 
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67. if frequency of P in disease-associated chromosomes is less than lb 

68. then return; 

69. // Create and test legal extensions of current pattern P (3 cases): 

70. // 1. Give marker i+1 all possible values: 
5 71.foreach a in alleles(/n z *+/) { 

72. /?;+/ :=a 

73. checkPatterns (P, start, /+/, nr_of_gaps, 0) 

74. } 

75. // 2. Introduce a new gap starting at marker z+7: 
10 76. if pi * '*' and nr_ofjgaps < g and s > 1 then { 

ll.Pi+1 := '*' 

78. checkPatterns (P, start, i+7, nr_of_gaps+l, 1) 

79. } 

80. // 3. Extend the current gap over marker 
15 81. if pi = '*' and gap jength < s then { 

82. m7 := '*' 

83. checkPatterns (P, start, /+/, nrjDf_gaps, gap_length+l) 

84. } 

85. // Before returning, reset ; : 
20 86./?/+/ := 

87. return 

88. } 

7. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

25 Input 

• set U of marker patterns 

• evaluation function e(P) for patterns P in U 

• (generalization) relation < for patterns in U, where the function e and the rela- 
tion < are such that if e(P) is true and P' < P, then eiP") is also true 

30 Output 

• set S = {P in U \ e(P) is true} of patterns 



Definitions 
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• function Lgg: U -> 2 , Lgg(P) = { P ' in U \ P>P' and P' != P and there is no 
P"mU such that P\=P"\=P' and P > P " > P '} , the set of least general gen- 
eralizations of pattern P. 

• function Lss: U -> 2 U , Lss{P) = { P' in U \ P<P' and P' != P and there is no 
5 P" in U such that P \=P" != P' andP<P"< P'}, the set of least special spe- 
cializations of pattern P. 

Method 

89.5: = {} 
90.Q:={} 

10 91.// Start with the most general patterns : 

92. F : = {P in U \ there is no P' in U, P' \= P, such that P' < P}; 

93. while F\= {} { 



94. // Evaluate the candidate patterns: 

95. foreachPinF { 

1 5 96. if e(P) = true then insert P into set S 

97. else remove P from set F 

98. } 

99. Q : = Q union F 

100. // Generate a new set of candidate patterns: 
20 101. C: = {} 

102. foreachPinF { 

103. C: = Cunion { P' in U \ P' in Lss{P) andfor all P" in Lgg{Py. 

104. P"mQ) 

105. } 

25 106. F: = C 

107. } 

108. end 



8. A method of claim 1, wherein the marker patterns P are searched by the fol- 
30 lowing algorithm: 

Input 



• set U of marker patterns 

• evaluation function e(P) for patterns P in U 

• frequency threshold x 
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Output 

• set S = {P in U \ e{P) and ae(P) is true) of patterns, where ae(P) is true if and 
only if the frequency of pattern P exceeds a given threshold x 

Definitions 

5 • function Lgg: U -> 2 U , Lgg(P) = { P ' in U \P->P' and P'\=P and there is no 
P" in U such that P != P" != P' and P -> P" -> P'}, the set of least general 
generalizations of pattern P. 

• function Lss: U -> 2 U , Lss(P) = { P ' in U \P'->P and P'\=P and there is no 
P"\nU such that P != P" != P' and P' -> P" -> P), the set of least special 

10 specializations of pattern P. 

Method 

109. S: = {} 

110. = 

111. // Start with the most general patterns: 

15 1 12. F := {P in U \ there is no P' in U, P' != P, such that P -> P' }; 
113. while F\= {} { 



1 14. // Evaluate the candidate patterns: 

115. foreachPinF { 

116. if ae{P) = true then { 

20 117. if e(P) = true then insert P into set S 

118. } 

119. else remove P from set F 

120. } 

121. 0: = £>unionF 

25 122. // Generate a new set of candidate patterns: 

123. C: = {} 

124. foreachPinF { 

125. C : = C union { P' in U \ P' in Lss(P) and for all P" in 
Lgg{Py. 

30 126. P"inQ) 

127. } 

128. F: = C 

129. } 

130. end 
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9. A method of claim 1, wherein 

a) the phenotype being studied is qualitative, and 

b) the pattern evaluation function e(P) has the form e(P) = true if and only if 
e'(P) > x, where e'(P) is the (signed) association measure % 2 and x is a user 

5 specified minimum value, which is chosen so that the sizes of 5,- are large 

enough, such as 20, to give statistically sufficiently reliable estimates for 
the gene locus, and 

c) the score s(mj) of marker ra; is the size of S h also called marker-wise pat- 
tern frequency of m; and denoted by f(m{}. 

10 10. A method of claim 1, wherein 

a) the pattern evaluation function e(P) has the form e(P) = true if and only if 
e f (P) > x, where e'(P) is the absolute frequency of pattern P in the data and 
x is a user-specified value, which is chosen so that the sizes of S f are large 
enough, such as 20, to give statistically sufficiently reliable estimates for 

15 the gene locus, and, 

b) in order to derive the score s(mi) 9 the p value (statistical significance) of 
each marker pattern P in determining the phenotype being studied is evalu- 
ated, and 

c) the score s(mj) is the distance between the observed p value distribution of 
20 patterns in S t and the uniform distribution, defined as average of {p t - <?,) 

log {pi I qd over all i = L.n, where n is the number of haplotype patterns in 
Sf 9 Pi is the ith smallest p value in S h and q t is the expectation of the zth 
smallest p value, if the p values were randomly drawn from the uniform 
distribution. 

25 11. A method of claim 10, where the p value is computed using a linear model of 
form Y = {5\X\ + . . . + f3 k X k + aZ+ fa where the dependent variable Y is the pheno- 
type being studied, X x through X k are covariates, such as environmental factors, and 
Z is a dummy variable for the occurrence of the haplotype pattern, and 

the coefficients a and P* are adjusted for best fit, and then 

30 the significance of Z as a covariate is assessed using a t test with the null hypothesis 
"a = 0". 
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12. A method of claim 1, wherein each score s(mj) is refined by replacing it by the 
marker-wise p value of the score s(mj), where the statistical significance of s(m\) is 
measured against the null hypotheses that there is no gene effect. 

13. A method of claim 12, wherein the marker- wise p values p(m0 are determined 
5 by randomly permuting phenotypes. 

14. A method of claim 1 5 wherein the area returned from the prediction of the 
gene location is contiguous or fragmented or a point. 

15. A method of claim 1, wherein the location of the gene, predicted as a function 
of the scores s(mj) and based on maximizing or minimizing the score, is predicted to 

10 the location of the marker m/ that maximizes or minimizes the marker score s(mj), 

16. A method of claim 1 5 wherein the location of the gene, predicted as a function 
of the scores s(mj) and based on maximizing or minimizing the score, is predicted to 
the combination of most probable intervals for containing the trait-susceptibility lo- 
cus that covers at most the desired proportion t (te {0,100%}) of the original region 

15 obtained by taking all such points in the studied chromosomal region whose nearest 
marker is within the k best scoring markers, where k is selected such that the result- 
ing area has length at most t times the length of the studied region, and where k is 
maximal such value. 

17. A method of claim 1, wherein the location of the gene, predicted as a function 
20 of the scores s(mj) and based on maximizing or minimizing the score, is predicted to 

those points in the studied chromosomal region whose nearest marker scores at least 
y or at most where y is scoring function dependent and is selected so that the 
probability of the gene being close to the marker is sufficiently large. 

18. A method of claim 1, wherein the location of the gene, predicted as a function 
25 of the scores s(mj) and based on maximizing or minimizing the score, is determined 

by expert investigation of the marker scores or their visualization. 

19. A method of claim 1, wherein several genes are searched for simultaneously 
by using marker patterns that refer to several potential gene loci at the same time. 

20. A computer-readable data storage medium having computer-executable pro- 
30 gram code stored thereon operative to perform a method of any of preceding claims 

when executed on a computer. 
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21. A computer system programmed to perform the method of any of claims 1 to 
19. 



